Reliable measures of synatctic and lexical complexity: The case of Iris Murdoch

نویسندگان

  • Stefan Evert
  • Sebastian Wankerl
  • Elmar Nöth
چکیده

Quantitative measures of the syntactic and lexical complexity of natural language text – such as type-token ratio (TTR), Yule’s K (1944) or Yngve depth (Yngve, 1960) – play a central role in stylometric analysis. They have been used to investigate stylometric differences between writers and settle questions of disputed authorship (Stamatatos, 2009), to explore the characteristics of translated texts (Volansky, Ordan, & Wintner, 2015), to identify determinants of style in scientific writing (Bergsma, Post, & Yarowsky, 2012), to study diachronic changes in grammar (Bentz, Kiela, Hill, & Buttery, 2014), to assess the readability and difficulty level of a text (Graesser, McNamara, Louwerse, & Cai, 2004; Collins-Thompson, 2014), and as a feature in the multivariate analysis of linguistic variation (Biber, 1988; Diwersy, Evert, & Neumann, 2014). In particular, several recent studies (Garrard, Maloney, Hodges, & Patterson, 2005; Pakhomov, Chacon, Wicklund, & Gundel, 2011; Le, Lancashire, Hirst, & Jokel, 2011) attempt to detect early symptoms of dementia in the last novels written by the British author Iris Murdoch, who was diagnosed with Alzheimer’s disease in 1997. These studies focus primarily on quantitative complexity measures, based on the assumption that beginning dementia reduces either the lexical or the syntactic complexity of a patient’s writing. Results were inconclusive: while the first two studies obsereved a promising decline of complexity in Murdoch’s last novel Jackson’s Dilemma published in 1995,1 Le et al. (2011) analyzed a larger sample of Murdoch’s writings and found that most of the quantitative measures did not to show any clear effects. In particular, they rejected the hypothesis of a decline in syntactic complexity. Like most work in stylometry, all three studies fail to take the sampling distributions of complexity measures into account. As a result, they are prone to over-interpreting observed differences that may well be explained by random variation. Only Le et al. (2011) apply significance tests, but they test for a linear trend in complexity across the span of Murdoch’s writing career, which would not be consistent with the typical development of Alzheimer’s disease. In this paper, we propose a novel methodology for the computation of reliable confidence intervals and significance tests for measures of linguistic complexity, inspired by ideas from bootstrapping and cross-validation. As an illustration, we apply the new method to the case of Iris Murdoch, showing that most of the differences observed in previous work are not signficant and can indeed be accounted for by sampling variation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Relationship between Syntactic and Lexical Complexity in Speech Monologues of EFL Learners

: This study aims to explore the relationship between syntactic and lexical complexity and also the relationship between different aspects of lexical complexity. To this end, speech monologs of 35 Iranian high-intermediate learners of English on three different tasks (i.e. argumentation, description, and narration) were analyzed for correlations between one measure of sy...

متن کامل

Cognitive Task Complexity and Iranian EFL Learners’ Written Linguistic Performance across Writing Proficiency Levels

Recently tasks, as the basic units of syllabi, and the cognitive complexity, as the criterion for sequencing them, have caught many second language researchers’ attention. This study sought to explore the effect of utilizing the cognitively simple and complex tasks on high- and low-proficient EFL Iranian writers’ linguistic performance, i.e., fluency, accuracy, lexical complexity, and structura...

متن کامل

Students’ Oral Assessment Considering Various Task Dimensions and Difficulty Factors

This study investigated students’ oral performance ability accounting for various oral analytical factors including fluency, lexical and structural complexity and accuracy with each subcategory. Accordingly, 20 raters scored the oral performances produced by 200 students and a quantitative design using a MANOVA test was used to investigate students’ score differences of various levels of langua...

متن کامل

The Impact of Task Complexity along Single Task Dimension on EFL Iranian Learners' Written Production: Lexical complexity

Based on Robinson’s Cognition Hypothesis, this study explored the effects of task complexity on the lexical complexity of Iranian EFL students’ argumentative writing.This study was designed to explore the manipulation of cognitive task complexity along +/-single task dimension (a resource dispersing dimension in Robinson’s triadic framework) on Iranian EFL learners’ production in term of lexica...

متن کامل

Longitudinal detection of dementia through lexical and syntactic changes in writing: a case study of three British novelists

We present a large-scale longitudinal study of lexical and syntactic changes in language in Alzheimer’s disease using complete, fully parsed texts and a large number of measures, using as our subjects the British novelists Iris Murdoch (who died with Alzheimer’s), Agatha Christie (who was suspected of it), and P.D. James (who has aged healthily). We avoid the limitations and deficiencies of Gar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017